AITopics

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Neural Information Processing SystemsFeb-18-2026, 06:22:30 GMT

d2a1e47f7dc635fac77fbd6e2ec799e4-Supplemental-Datasets_and_Benchmarks_Track.pdf

artificial intelligence, machine learning, natural language, (18 more...)

Country:

Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.15)
Asia > Japan > Honshū > Kansai > Kyoto Prefecture > Kyoto (0.05)

Industry:

Law (0.94)
Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.45)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.47)
Information Technology > Artificial Intelligence > Natural Language (0.30)

Neural Information Processing SystemsDec-25-2025, 01:40:55 GMT

Sequence-Augmented SE(3)-Flow Matching For Conditional Protein Generation

Proteins are essential for almost all biological processes and derive their diverse functions from complex $3 \rm D$ structures, which are in turn determined by their amino acid sequences. In this paper, we exploit the rich biological inductive bias of amino acid sequences and introduce FoldFlow++, a novel sequence-conditioned $\text{SE}(3)$-equivariant flow matching model for protein structure generation. FoldFlow++ presents substantial new architectural features over the previous FoldFlow family of models including a protein large language model to encode sequence, a new multi-modal fusion trunk that combines structure and sequence representations, and a geometric transformer based decoder. To increase diversity and novelty of generated samples -- crucial for de-novo drug design -- wetrain FoldFlow++ at scale on a new dataset that is an order of magnitude larger than PDB datasets of prior works, containing both known proteins in PDB and high-quality synthetic structures achieved through filtering. We further demonstrate the ability to align FoldFlow++ to arbitrary rewards, e.g.

artificial intelligence, large language model, natural language, (8 more...)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.59)

Neural Information Processing SystemsNov-18-2025, 22:17:05 GMT

A VIDa-hIL6: A Large-Scale VHH Dataset Produced from an Immunized Alpaca for Predicting Antigen-Antibody Interactions

To accelerate therapeutic antibody discovery, computational methods, especially machine learning, have attracted considerable interest for predicting specific interactions between antibody candidates and target antigens such as viruses and bacteria.

artificial intelligence, machine learning, sequence, (17 more...)

Country:

Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.15)
Asia > Japan > Honshū > Kansai > Kyoto Prefecture > Kyoto (0.04)
North America > United States > New Hampshire > Rockingham County > Portsmouth (0.04)
(8 more...)

Genre: Research Report > Experimental Study (0.93)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

arXiv.org Artificial IntelligenceNov-18-2025

Protein Secondary Structure Prediction Using 3D Graphs and Relation-Aware Message Passing Transformers

Varshney, Disha, Garg, Samarth, Tyagi, Sarthak, Varshney, Deeksha, Deep, Nayan, Ekbal, Asif

In this study, we tackle the challenging task of predicting secondary structures from protein primary sequences, a pivotal initial stride towards predicting tertiary structures, while yielding crucial insights into protein activity, relationships, and functions. Existing methods often utilize extensive sets of unlabeled amino acid sequences. However, these approaches neither explicitly capture nor harness the accessible protein 3D structural data, which is recognized as a decisive factor in dictating protein functions. To address this, we utilize protein residue graphs and introduce various forms of sequential or structural connections to capture enhanced spatial information. We adeptly combine Graph Neural Networks (GNNs) and Language Models (LMs), specifically utilizing a pre-trained transformer-based protein language model to encode amino acid sequences and employing message-passing mechanisms like GCN and R-GCN to capture geometric characteristics of protein structures. Employing convolution within a specific node's nearby region, including relations, we stack multiple con-volutional layers to efficiently learn combined insights from the protein's spatial graph, revealing intricate interconnections and dependencies in its structural To assess our model's performance, we employed the training dataset provided by NetSurfP-2.0, which outlines secondary structure in 3-and 8-states. Extensive experiments show that our proposed model, SSRGNet surpasses the baseline on f1-scores. Introduction Proteins serve as essential components within cells and are involved in various applications, spanning from therapeutics to materials. They are composed of a sequence of amino acids that fold into distinct shapes. With the development of affordable sequencing technologies [1, 2], a substantial number of novel protein sequences have been identified in recent times. However, annotating the functional properties of a newly discovered protein sequence is still a laborious and expensive process. Thus, there is a need for reliable and efficient computational methods to accurately predict and assign functions to proteins, thereby bridging the gap between sequence information and functional knowledge. The analysis of protein structure, particularly the tertiary structure, is highly significant for practical applications related to proteins, such as understanding their functions and designing drugs [3].

machine learning, natural language, prediction, (18 more...)

2511.13685

Country: Asia > India (0.46)

Genre: Research Report > New Finding (0.34)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Education > Health & Safety > School Nutrition (0.80)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
(3 more...)

arXiv.org Artificial IntelligenceOct-14-2025

Protein as a Second Language for LLMs

Chen, Xinhui, Li, Zuchao, Gao, Mengqi, Zhang, Yufeng, Leong, Chak Tou, Li, Haoyang, Chen, Jiaqi

Deciphering the function of unseen protein sequences is a fundamental challenge with broad scientific impact, yet most existing methods depend on task-specific adapters or large-scale supervised fine-tuning. We introduce the "Protein-as-Second-Language" framework, which reformulates amino-acid sequences as sentences in a novel symbolic language that large language models can interpret through contextual exemplars. Our approach adaptively constructs sequence-question-answer triples that reveal functional cues in a zero-shot setting, without any further training. To support this process, we curate a bilingual corpus of 79,926 protein-QA instances spanning attribute prediction, descriptive understanding, and extended reasoning. Empirically, our method delivers consistent gains across diverse open-source LLMs and GPT-4, achieving up to 17.2% ROUGE-L improvement (average +7%) and even surpassing fine-tuned protein-specific language models. These results highlight that generic LLMs, when guided with protein-as-language cues, can outperform domain-specialized models, offering a scalable pathway for protein understanding in foundation models.

large language model, machine learning, natural language, (16 more...)

2510.11188

Country:

Asia (0.93)
North America > United States (0.46)

Genre: Research Report (0.64)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Education > Health & Safety > School Nutrition (0.52)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Neural Information Processing SystemsOct-9-2025, 00:07:55 GMT

8339dacd9df7ffe9623760f74169dd1e-Paper-Datasets_and_Benchmarks.pdf

artificial intelligence, bioinformatics, machine learning, (18 more...)

Country:

Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.15)
Asia > Japan > Honshū > Kansai > Kyoto Prefecture > Kyoto (0.04)
North America > United States > New Hampshire > Rockingham County > Portsmouth (0.04)
(8 more...)

Genre: Research Report > Experimental Study (0.93)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Biomedical Informatics (0.69)

New ScientistOct-2-2025, 19:00:38 GMT

Should we worry AI will create deadly bioweapons? Not yet, but one day

Should we worry AI will create deadly bioweapons? Artificial intelligence promises to transform biology, allowing us to design better drugs, vaccines and even synthetic organisms for, say, eating waste plastic. But some fear it could also be used for darker purposes, to create bioweapons that wouldn't be detected by conventional methods until it was too late. So, how worried should we be? "AI advances are fuelling breakthroughs in biology and medicine," says Eric Horvitz, chief scientific officer at Microsoft. "With new power comes responsibility for vigilance." His team has published a study looking at whether AI could design proteins that do the same thing as proteins that are known to be dangerous, but are different enough that they wouldn't be recognised as dangerous.

bioweapon, protein, virus, (13 more...)

New Scientist

Country:

North America > United States > California (0.05)
Asia > Japan (0.05)
Antarctica (0.05)

Industry:

Health & Medicine > Therapeutic Area > Immunology (0.49)
Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.48)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence (1.00)

arXiv.org Artificial IntelligenceSep-25-2025

Improved Therapeutic Antibody Reformatting through Multimodal Machine Learning

Xin, Jiayi, Raghu, Aniruddh, Bhattacharya, Nick, Carr, Adam, Montgomery, Melanie, Elliott, Hunter

Modern therapeutic antibody design often involves composing multi-part assemblages of individual functional domains, each of which may be derived from a different source or engineered independently. While these complex formats can expand disease applicability and improve safety, they present a significant engineering challenge: the function and stability of individual domains are not guaranteed in the novel format, and the entire molecule may no longer be synthesizable. To address these challenges, we develop a machine learning framework to predict "reformatting success" -- whether converting an antibody from one format to another will succeed or not. Our framework incorporates both antibody sequence and structural context, incorporating an evaluation protocol that reflects realistic deployment scenarios. In experiments on a real-world antibody reformatting dataset, we find the surprising result that large pretrained protein language models (PLMs) fail to outperform simple, domain-tailored, multimodal representations. This is particularly evident in the most difficult evaluation setting, where we test model generalization to a new starting antibody. In this challenging "new antibody, no data" scenario, our best multimodal model achieves high predictive accuracy, enabling prioritization of promising candidates and reducing wasted experimental effort.

artificial intelligence, machine learning, sequence, (18 more...)

2509.19604

Genre: Research Report > New Finding (0.46)

Industry: Health & Medicine > Therapeutic Area > Immunology (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.66)

arXiv.org Artificial IntelligenceAug-19-2025

ProtTeX-CC: Activating In-Context Learning in Protein LLM via Two-Stage Instruction Compression

Fan, Chuanliu, Ma, Zicheng, Gao, Jun, Yu, Nan, Zhang, Jun, Cao, Ziqiang, Gao, Yi Qin, Fu, Guohong

Recent advances in protein large language models, such as ProtTeX, represent both side-chain amino acids and backbone structure as discrete token sequences of residue length. While this design enables unified modeling of multimodal protein information, it suffers from two major limitations: (1) The concatenation of sequence and structure tokens approximately doubles the protein length and breaks the intrinsic residue-level alignment between modalities. (2) Constrained by the training corpus and limited context window, ProtTeX is typically trained on single-protein inputs, rendering it incompatible with in-context learning (ICL) and thus limiting its generalization capability. To address these issues, we propose ProtTeX-CC, a lightweight two-stage compression framework designed to enhance ProtTeX under few-shot settings. We first design a joint embedding compression mechanism that fuses sequence and structure representations at the residue level, effectively reducing the protein input length by half without sacrificing performance. Then we propose a self-compression module that aggregates each full demonstration into the latent space of the last few linguistic tokens, reducing the average demonstration length from 751 tokens to less than 16 tokens. Compared to the original ProtTeX, our self-compression approach achieves a compression ratio of approximately 93.68% in the total prompt length under the 16-shot setting. Without modifying the backbone model, ProtTeX-CC introduces only a small number of additional parameters through PEFT-based tuning in the joint embedding compression stage and a single trainable projection layer in the self-compression stage. Extensive experiments on protein function prediction show that ProtTeX-CC improves performance on the in-domain benchmark by 2%, and generalizes well to the out-of-domain dataset with a performance gain of 11%.

demonstration, large language model, machine learning, (18 more...)

2508.12212

Country: Asia > China (0.29)

Genre: Research Report (0.82)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Education > Health & Safety > School Nutrition (0.50)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)